Search CORE

8 research outputs found

Introducing risk management into the grid

Author: Birkenheuer G.
Djemame K.
Gourlay I.
Hovestadt M.
Odej K.
Padgett J.
Voss K.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2006
Field of study

Service Level Agreements (SLAs) are explicit statements about all expectations and obligations in the business partnership between customers and providers. They have been introduced in Grid computing to overcome the best effort approach, making the Grid more interesting for commercial applications. However, decisions on negotiation and system management still rely on static approaches, not reflecting the risk linked with decisions. The EC-funded project "AssessGrid" aims at introducing risk assessment and management as a novel decision paradigm into Grid computing. This paper gives a general motivation for risk management and presents the envisaged architecture of a "risk-aware" Grid middleware and Grid fabric, highlighting its functionality by means of three showcase scenarios

Crossref

White Rose Research Online

Evaluation of Data Enrichment Methods for Distributed Stream Processing Systems

Author: Casares Fabian
Geldenhuys Morgan K.
Kao Odej
Scheinert Dominik
Styp-Rekowski Kevin
Publication venue
Publication date: 26/07/2023
Field of study

Stream processing has become a critical component in the architecture of modern applications. With the exponential growth of data generation from sources such as the Internet of Things, business intelligence, and telecommunications, real-time processing of unbounded data streams has become a necessity. DSP systems provide a solution to this challenge, offering high horizontal scalability, fault-tolerant execution, and the ability to process data streams from multiple sources in a single DSP job. Often enough though, data streams need to be enriched with extra information for correct processing, which introduces additional dependencies and potential bottlenecks. In this paper, we present an in-depth evaluation of data enrichment methods for DSP systems and identify the different use cases for stream processing in modern systems. Using a representative DSP system and conducting the evaluation in a realistic cloud environment, we found that outsourcing enrichment data to the DSP system can improve performance for specific use cases. However, this increased resource consumption highlights the need for stream processing solutions specifically designed for the performance-intensive workloads of cloud-based applications.Comment: 10 pages, 13 figures, 2 table

arXiv.org e-Print Archive

Khaos: Dynamically Optimizing Checkpointing for Dependable Distributed Stream Processing

Author: Geldenhuys Morgan K.
Kao Odej
Pfister Benjamin J. J.
Scheinert Dominik
Thamsen Lauritz
Publication venue
Publication date: 03/08/2022
Field of study

Distributed Stream Processing systems are becoming an increasingly essential part of Big Data processing platforms as users grow ever more reliant on their ability to provide fast access to new results. As such, making timely decisions based on these results is dependent on a system's ability to tolerate failure. Typically, these systems achieve fault tolerance and the ability to recover automatically from partial failures by implementing checkpoint and rollback recovery. However, owing to the statistical probability of partial failures occurring in these distributed environments and the variability of workloads upon which jobs are expected to operate, static configurations will often not meet Quality of Service constraints with low overhead. In this paper we present Khaos, a new approach which utilizes the parallel processing capabilities of virtual cloud automation technologies for the automatic runtime optimization of fault tolerance configurations in Distributed Stream Processing jobs. Our approach employs three subsequent phases which borrows from the principles of Chaos Engineering: establish the steady-state processing conditions, conduct experiments to better understand how the system performs under failure, and use this knowledge to continuously minimize Quality of Service violations. We implemented Khaos prototypically together with Apache Flink and demonstrate its usefulness experimentally

arXiv.org e-Print Archive

Phoebe: QoS-Aware Distributed Stream Processing through Anticipating Dynamic Workloads

Author: Geldenhuys Morgan K.
Kao Odej
Scheinert Dominik
Thamsen Lauritz
Publication venue
Publication date
Field of study

Enlighten

Learning Dependencies in Distributed Cloud Applications to Identify and Localize Anomalies

Author: Acker Alexander
Geldenhuys Morgan K.
Kao Odej
Scheinert Dominik
Thamsen Lauritz
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Operation and maintenance of large distributed cloud applications can quickly become unmanageably complex, putting human operators under immense stress when problems occur. Utilizing machine learning for identification and localization of anomalies in such systems supports human experts and enables fast mitigation. However, due to the various interdependencies of system components, anomalies do not only affect their origin but propagate through the distributed system. Taking this into account, we present Arvalus and its variant D-Arvalus, a neural graph transformation method that models system components as nodes and their dependencies and placement as edges to improve the identification and localization of anomalies. Given a series of metric KPIs, our method predicts the most likely system state - either normal or an anomaly class - and performs localization when an anomaly is detected. During our experiments, we simulate a distributed cloud application deployment and synthetically inject anomalies. The evaluation shows the generally good prediction performance of Arvalus and reveals the advantage of D-Arvalus which incorporates information about system component dependencies

arXiv.org e-Print Archive

Enlighten

Effectively Testing System Configurations of Critical IoT Analytics Pipelines

Author: Geldenhuys Morgan K.
Gontarska Kain Kordian
Kao Odej
Lorenz Felix
Thamsen Lauritz
Publication venue: IEEE
Publication date: 24/02/2020
Field of study

The emergence of the Internet of Things has seen the introduction of numerous connected devices used for the monitoring and control of even Critical Infrastructures. Distributed stream processing has become key to analyzing data generated by these connected devices and improving our ability to make decisions. However, optimizing these systems towards specific Quality of Service targets is a difficult and time-consuming task, due to the large-scale distributed systems involved, the existence of so many configuration parameters, and the inability to easily determine the impact of tuning these parameters. In this paper we present an approach for the effective testing of system configurations for critical IoT analytics pipelines. We demonstrate our approach with a prototype that we called Timon which is integrated with Kubernetes. This tool allows pipelines to be easily replicated in parallel and evaluated to determine the optimal configuration for specific applications. We demonstrate the usefulness of our approach by investigating different configurations of an exemplary geographically-based traffic monitoring application implemented in Apache Flink

arXiv.org e-Print Archive

Crossref

Enlighten

Khaos: Dynamically Optimizing Checkpointing for Dependable Distributed Stream Processing

Author: Geldenhuys Morgan K.
Kao Odej
Pfister Benjamin J. J.
Scheinert Dominik
Thamsen Lauritz
Publication venue
Publication date: 03/08/2022
Field of study

No abstract available

arXiv.org e-Print Archive

Enlighten

Enel: Context-Aware Dynamic Scaling of Distributed Dataflow Jobs Using Graph Propagation

Author: Acker Alexander
Geldenhuys Morgan K.
Kao Odej
Scheinert Dominik
Thamsen Lauritz
Will Jonathan
Zhu Houkun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2021
Field of study

Distributed dataflow systems like Spark and Flink enable the use of clusters for scalable data analytics. While runtime prediction models can be used to initially select appropriate cluster resources given target runtimes, the actual runtime performance of dataflow jobs depends on several factors and varies over time. Yet, in many situations, dynamic scaling can be used to meet formulated runtime targets despite significant performance variance.This paper presents Enel, a novel dynamic scaling approach that uses message propagation on an attributed graph to model dataflow jobs and, thus, allows for deriving effective rescaling decisions. For this, Enel incorporates descriptive properties that capture the respective execution context, considers statistics from individual dataflow tasks, and propagates predictions through the job graph to eventually find an optimized new scale-out. Our evaluation of Enel with four iterative Spark jobs shows that our approach is able to identify effective rescaling actions, reacting for instance to node failures, and can be reused across different execution contexts

arXiv.org e-Print Archive

Enlighten